11/30/2018
Probability model.
Model encodes our understanding of the scientific process of interest.
Model accounts for as much uncertainty as possible.
Model results in a probability distribution.
Update model with data.
Criticize the model
Does the model fit the data well?
Do the predictions make sense?
Are there subsets of the data that don't fit the model well?
Make inference using the model.
Start with probability distributions:
\[ \begin{align*} \left[y_i | \boldsymbol{\theta} \right] & \sim \operatorname{N}(X_i \beta, \sigma^2) \\ \boldsymbol{\theta} & = (\beta, \sigma^2) \end{align*} \]
Hierarchical model:
A model built in components.
Each component represents a different statistical goal.
Break the model into components:
Data Model.
Process Model.
Prior Model.
\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} [\mathbf{z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} }% \]
\[ {\huge \begin{align*} \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} \end{align*} } \]
\[ {\huge \begin{align*} \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} \end{align*} } \]
\[ {\huge \begin{align*} \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} \end{align*} } \]
Age of minerals:
\(\mathbf{y}\) is the radio-date estimate.
\(\mathbf{z}\) is the true mineral age.
\(\theta_D\) is the radio-date standard error.
The probability distribution is determined by the measurement process.
\[ {\huge \begin{align*} \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} \end{align*} } \]
Reconstructing climate from tree rings
\(\mathbf{y}\) is the tree ring width increment.
\(\mathbf{z}\) is the true, unobserved climate variable.
\(\boldsymbol{\theta}_D\) models the relationship between climate, stand dynamics, individual heterogeneity, tree age, (etc.) and tree ring width.
The probability distribution is determined by tree physiology, measurement uncertainty, etc.
\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]}[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} } \]
\[ {\huge \begin{align*} \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]} \end{align*} } \]
\[ {\huge \begin{align*} \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]} \end{align*} } \]
\[ {\huge \begin{align*} \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]} \end{align*} } \]
Reconstructing climate with tree rings
Trees of the same species share a similar response to climate.
Climate variables at sites nearby in location are closer to each other than sites far apart, on average.
Climate variables seperated by short periods of time are more similar than climate variables over long periods of time.
\(\mathbf{z}\) is the value of the unobserved climate variables.
\(\boldsymbol{\theta}_P\) are the species-specific growth responses and the correlation of climate across time and space.
\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] [\mathbf{z} | \boldsymbol{\theta}_P] \color{orange}{[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P]} \end{align*} } \]
Probability distributions define "reasonable" ranges for parameters.
\[ {\huge \begin{align*} \color{cyan}{[\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}]} & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] [\mathbf{z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} } \]
\[ {\huge \begin{align*} \color{cyan}{[\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}]} \end{align*} } \]
Probability distribution over all unknowns in the model.
Inference is made using the posterior distribution.
Because the posterior distribution is a probability distribution, uncertainty is easy to calculate.
Climate change is well understood globally.
Climate change is less well understood locally.
Need for spatially explicit reconstructions of climate variables.
Problem: data sources are messy and noisy.
Vegetation composition and structure change from ice age to current period.
Using change in temperature to predict future vegetation change.
Data model: Multi-logit distribution for ordered categories of observed change.
Process model: Assumes increasing temperature results in smooth changes of composition and struction.
Prior model: Not used.
Sharman and Johnstone (2017). Sediment unmixing using detrital geochronology. Earth and Palenetary Science Letters.
\[ \begin{align*} y_{ib} & \sim \operatorname{N}(z_{ib}, \sigma^2_{ib}). \\ y_{id} & \sim \operatorname{N}(z_{id}, \sigma^2_{id}). \end{align*} \]
Assumptions:
\[ \begin{align*} {z}_{ib} \sim \sum_{k=1}^\infty p_{bk} \operatorname{N}(\mu_k, \sigma^2_k). \end{align*} \]
\[ \begin{align*} z_{id} & \sim \sum_{b=1}^B \phi_b \sum_{k=1}^K p_{bk} \operatorname{N}(\mu_k, \sigma^2_k). \end{align*} \]
\[ \begin{align*} \phi_1 = 0.200 \quad\quad\,\,\, \phi_2 = 0.532 \quad\quad\,\,\,\,\, \phi_3 = 0.268 \,\,\quad\quad\quad \mbox{Daughter} \end{align*} \]
\[ \begin{align*} y_{id} & \sim \operatorname{N}(z_{id}, \sigma^2_{id}). \end{align*} \]
\[ \begin{align*} z_{id} & \sim \sum_{b=1}^B \phi_{db} \sum_{k=1}^K p_{bk} \operatorname{N}(\mu_k, \sigma^2_k). \end{align*} \]
\[ \begin{align*} \sum_{k=1}^K I\{ \phi_b^{(k)} > 0.5 \}. \end{align*} \]
Account for spatial correlation among daughters.
Account for temporal correlation within a sediment core.
–>
–> –> –> –> –> –> –> –> –> –> –> –> –>
–> –> –> –> –> –> –> –> –> –> –> –> –> –>
–> –>
–> –> –> –> –> –> –> –> –> –>
–> –> –> –> –> –> –> –> –> –> –> –>
–> –> –>
–> –> –> –> –> –>
–> –> –> –> –> –> –> –> –>